Modified Active Learning for Document Level Clustering
نویسندگان
چکیده
منابع مشابه
Clustering Document with Active Learning using Wikipedia
Wikipedia has been applied as a background knowledge base to various text mining problems, including document categorization, topic indexing and information extraction. However, very few attempts have been made to utilize it for document clustering. In this paper we propose to exploit Wikipedia and the semantic knowledge therein to facilitate clustering, enabling the automatic grouping of docum...
متن کاملExploiting Document Level Semantics in Document Clustering
Document clustering is an unsupervised machine learning method that separates a large subject heterogeneous collection (Corpus) into smaller, more manageable, subject homogeneous collections (clusters). Traditional method of document clustering works around extracting textual features like: terms, sequences, and phrases from documents. These features are independent of each other and do not cat...
متن کاملA Multi-level Approach for Document Clustering
The divisive MinMaxCut algorithm of Ding et al. [3] produces more accurate clustering results than existing document cluster methods. Multilevel algorithms [4, 1, 5, 7] have been used to boost the speed of graph partitioning algorithms. We combine these two algorithms to construct faster and more accurate algorithm. In this new algorithm, the original graph is coarsened, partitioned by the divi...
متن کاملA Modified Fuzzy ART for Soft Document Clustering
Document clustering is a very useful application in recent days especially with the advent of the World Wide Web. Most of the existing document clustering algorithms either produce clusters of poor quality or are highly computationally expensive. In this paper we propose a document-clustering algorithm, KMART, that uses an unsupervised Fuzzy Adaptive Resonance Theory (Fuzzy-ART) neural network....
متن کاملA Novel Modified Apriori Approach for Web Document Clustering
The Traditional apriori algorithm can be used for clustering the web documents based on the association technique of data mining. But this algorithm has several limitations due to repeated database scans and its weak association rule analysis. In modern world of large databases, efficiency of traditional apriori algorithm would reduce manifolds. In this paper, we proposed a new modified apriori...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IARJSET
سال: 2017
ISSN: 2393-8021
DOI: 10.17148/iarjset/nciarcse.2017.29